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QUASI-METRIC SPACES WITH MEASURE 

ALEKSANDAR STOJMIROVIC 

Abstract. The phenomenon of concentration of measure on 
high dimensional structures is usually stated in terms of a 
metric space with a Borel measure, also called an mm-space. 
We extend some of the mm-space concepts to the setting of a 
quasi- metric space with probability measure (pq-space). Our 
motivation comes from biological sequence comparison: we 
show that many common similarity measures on biological 
sequences can be converted to quasi-metrics. We show that a 
high dimensional pq-space is very close to being an mm-space. 



1. Introduction 

Definition 1.1. Let X be a set. A mapping q : X X ^ IR+ is 
called a quasi-metric if 

(i) for all G X, q{x,y) = q{y,x) = <^=^ x = y, 

(ii) for all x,y,z e X, q{x, z) < q{x, y) + q{y, z). 

liq is also symmetric, that is, for all x,y € X, q{x, y) = q{y, x), then 
q is a metric. For each quasi-metric q, we denote by q its conjugate 
quasi-metric, where q{x, y) = q{y, x). Furthermore, we call the met- 
ric q, defined for each x, y € X by q{x, y) = max{g(x, y), q{y, x)} = 
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max{q(x, y),q{x, y)}, its associated metric. The pair {X, q) is called 
a quasi-metric space. 

Let w he a (positive) real-valued function on X. The triple 
{X,q,w) is called a (generalised) weighted quasi-metric space |1U1 
1211 if for all x,y eX 

q{x, y) + w{x) = q{y, x) + w{y). 

Due to assymetry, many metric space structures naturally cor- 
respond to two quasi-metric structures, which will be henceforth 
referred to as the left- and right- structures wherever possible. 

Definition 1.2. Let (X, q) be a quasi-metric space, x £ X , A, B C 
X and e > 0. Denote by 

• diam(A) := sup{g(x, y) : x,y £ A}, the diameter of set A; 

• (3;) := {y £ X : q{x, y) < e}, the left open ball of radius 
e centered at x; 

• := {y £ X : q{y,x) < e}, the right open ball of 
radius e centered at x; 

• T)e{x) := {y £ X : q{x, y) < e}, the associated metric open 
ball of radius e centered at x; 

• q{x,A) := mi{q{x,y) : y £ A}, the left distance from x to 
A; 

• q{A,x) := m{{q{y,x) : y £ A}, the right distance from x 
to A; 

• q{A,x) := mi{q{x,y) : y £ A}, the associated metric dis- 
tance from x to A; 

• A^ := {x £ X : q{A,x) < e}, the left e-neighbourhood of 
A; 

• A^ := {x £ X : q{x, A) < e}, the right e-neighbourhood of 
A; 

• A^ := {x £ X : q{A,x) < e}, the associated metric s- 
neighbourhood of A. 

Each quasi-metric q naturally induces a Tq topology T{q) whereby 
a set U is open if for each x £ U there is e > such that S)^(x) C U. 
The topology T(q) can be similarly defined in by using the right 
balls as its base and hence a quasi-metric space {X, q) can be nat- 
urally associated with a bitopological space {X,T{q),T{q)). Topo- 
logical aspects of quasi-metric spaces have been very extensively 
researched - the review by Kiinzi |5] contains 589 references! Note 
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that a Tq quasi-metric is frequently called a quasi-pseudometric [S] 
while the name quasi-metric is reserved for a map q : X x X which 
satisfies q{x,y) = <^=^> x = y instead of axiom (i) in Definition 
11.11 and whose associated topology is hence Ti. 

The main objective of this paper is to generalise various con- 
cepts related to the phenomenon of concentration of measure on 
high- dimensional structures jl3l l7l[TT]. which are usually defined in 
terms of metric spaces with measure, to quasi-metric spaces with 
measure. While many constructions from the metric case carry 
through to the quasi- metric case without much change, some quasi- 
metric results have only trivial analogs. We will show that, in a 
natural sense to be defined later, 

A 'high- dimensional' quasi-metric space is, typically, 

very close to being a metric space. 
Before proceeding, we will examine our motivation for doing so and 
in doing so provide another example of a quasi-metric space which, 
we believe, was not observed before. 

2. Motivation: biological sequences 

Consider sets of finite sequences over a finite alphabet S, denoted 
S*. Examples of such sets are the set of all English words and, 
most importantly for us, sets of DNA or protein sequences. DNA 
sequences are formed from a four letter alphabet S = {A, C, G, T}, 
while the protein alphabet consists of 20 amino acids. 

Search of DNA and protein sequence datasets [21 Hj by similarity 
is of fundamental importance in contemporary life sciences. The 
most basic search, performed using software tools such as BLAST 
[2], is the range similarity search: given a query sequence, find all 
the closest neighbours of that point with respect to some similarity 
measure. 

The main similarity measure used is the Smith- Waterman [JOj 
local similarity score. We will endevour to produce one of many of 
its equivalent definitions and show that under certain conditions, 
which are satisfied for most common practical cases, it can be con- 
verted to a (generalised weighted) quasi- metric. 

Definition 2.1. Let A C N such that 1^41 = n G N. Denote by Ai, 
where i < n, the i-th element of A (under the usual order on N). 
If / C {1, 2, . . . , n}, set Aj = {Ai e A \ i e I}. 
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Denote by y : 2^ ^ R a gap penalty satisfying: 

(1) yA c N, g{A) > 0, and 
{2)VA,Bcn ACB ^ g{A) < g{B). 

Let S be a finite alphabet. For any sequence x S E", n G 
N and any set A C {1,2, ...,n}, let x/ denote the subsequence 
■ ■ -XAk where \A\ = k. Let S':SxE^Rbea map 
and X € S™, y G 'E"',m,n G N. Define the ZocaZ similarity score 
s : S* X E* ^ M, by 

s(x, y) = max_{T(xA, ys) - - g{B)} 

A,A,B,B 

where ^ C {1, 2, . . . , m}, B C {1,2,..., n}, |^| = |S| = A;, 1! = 
{^i,^i + l,...,^fc-l,^fc}\^,5 = {Bi,5i + l,...,Sfe-l,Bfe}\B 
and T{xA,yB) = Ei*=i -5 (x , 2/b J • 

The above definition can be interpreted in the following way. 
Firstly, two contiguous subsequences x' and y', of x and y respec- 
tively, are chosen which is why the similarity score is called local. 
Secondly, each letter x' and y' is either aligned with a letter from 
the other subsequence or deleted. The scores for aligned letters are 
given by S while the costs of deletions are given by the gap penalty. 
Gap penalty functions may depend not only on the number of gaps 
but on their locations: contiguous gaps often have lower cost as- 
sociated with them. Hence, we construct the local similarity score 
as the score of the best local alignment of two sequences given the 
gap penalties. 

The following result allows us to convert similarity scores to 

quasi-metrics. 

Lemma 2.2. Let X be a set and s : X x X a map such that 

( 1) s{x, x) > s{x, y) Vx, y eX, 

(2) s{x, y) = s{x, x) A s{y, x) = s{y, y) ^ x = y Vx, y G X, 

(3) s{x, y) + s{y, z) < s{x, z) + s{y, y) Vx, y,z eX. 

Then q : X x X ^ M. where {x,y) s{x,x) — s{x,y) is a quasi- 
metric. Furthermore, if s is symmetric, that is, s{x,y) = s{y,x) 
for all x,y E X, q is a generalised weighted quasi-metric with the 
weight function w : x 1-^ s{x,x). 

Proof. Positivity of q is equivalent to (1), separation of points is 
equivalent to (2) while the triangle inequality is equivalent to (3). 
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If s(x,y) = s{y,x) tlienq{y,x)+s{x,x) = s{y,y)-s{x,y)+s{x,x) = 
s{x, x) —s{x, y)+s{y, y) = q{x, y)+s{y, y) and thus w : x ^ s{x, x) 
is a generalised weight. □ 

Theorem 2.3. Suppose S : S x S ^ M satisfies conditions of 
the Lemma \2.^ and S{a,a) > for all a G S. Then so does the 
similarity score s on T,* as defined in Definition \2.1\ 



Proof. It is easy to see that if S satisfies the Lemma 12.21 so does T. 
Since 5(a, a) > for all x G S, it is clear that s{x,x) = T{x,x) 
and thus s{x,x) > s{x,y) for all x,y £ X. 

If s{x,y) = s{x,x) then s{x,y) = T{x,x) and hence x is subse- 
quence of y. Similarly, if s{y,x) = s{y,y), y is subsequence of x. 
Thus, s{x,y) = s{x,x) A s{y,x) = s{y,y) ^_x_ = y. 

To prove the third statement pick A,B,A,B,C,D,C,D such 
that 

S{x,y) = T{xA,yB) - g{x,A) - h{y,B) and 
S{y,z) = T{yc,ZD) - giy,C) - g{z,D). 

Let / and J be the sets of indices (possibly empty) of A and B, and 
B and C respectively, such that Bj = Cj = B CiC. It is clear that 
|/| = |J|. Denote by K and L the remaining indices of B and C 
respectively, that is, the sets such that Bk = B\C and Cl = C\B. 
Since T is a sum over sets of indices, we have 

T{xA,yB) = T{xAi,yBi) + T{xAK,yBK) and 
T{xc,yD) = T{ycj,ZDj) +T{zc^,zdl). 

Furthermore, let Aj and Dj be sets of gaps, that is, 

Ar = {Ai„Ai,+l,...,A\i\-l,A\j\}\Ai and 
A7 = {Dj„Dj,+l,...,D\j\-l,D\j\}\Dj. 

Since I and J are subsets of indices of A and D respectively, Aj C A 
and Uj and hence g(A^) < g(A) and g(Dj) < g(D). 



6 



ALEKSANDAR STOJMIROViO 



Thus, s{x,y) + s{y,z) 

= T{xA,yB) - g(A) - g(B) + T{yc, zd) - g(C) - g(D) 
< T{xAj,yBj) +T{xAK,yBK) - g{Ai) 

+ T{ycj,ZDj) +T{ycL,ZDL) - giDj) 



+T{yBi ,yBi) + T{yBji , yBK ) + T{ycL ^VCl)- 

Observing that T{xaj, zdj) — gi^i) — g{Dj) < s{y,z) and, since 
Bj, Bk and Bl are disjoint subsets of indices of y, T{yBj,yBi) + 
T{yAK^yBK)+T{ycL,yDL) < T{y,y) = s{y,y) completes the proof. 

□ 

The conditions of the Lemma 12.21 are satisfied by most of the 
BLOSUM 8 similarity score matrices on the amino acid alphabet, 
produced in the following way. Biologicaly closely related fragments 
of protein sequences are clustered together in the form of multiple 
alignments or blocks so that each row in a block represents a differ- 
ent fragment. The fragments within blocks are further clustered to 
reduce the effect of too closely related fragments and the relative 
frequency of observing amino acid i in the same column as amino 
acid j is denoted cpij (this is the aggregate over all columns and 
over all blocks). The similarity score S is given by 



where tpi is the overall frequency of amino acid i. Hence S is 
symmetric and it is easy to see that the triangle inequality of the 
quasi-metric obtained by the transformation from the Lemma 12.21 
is equivalent to 



for all amino acids i,j,k. In many cases frequencies of two different 
amino acids being aligned are much smaller than the frequencies 
of amino acids being aligned with themselves and the triangle in- 
equality is satisfied. 

BLOSUM matrices are the most frequently used score matrices 
for similarity search of protein sequences and, as it can be seen 
from above, are also symmetric so that the quasi-metric obtained 
is generalised weighted. The similarity measures on DNA alphabet 



< 



T{xAi,ZDj) -g{Aj) -g{Dj) 




(t>ij<Pjk < 4>ik4>jj 
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produce a metric but the distance derived from the local similarity 
score on DNA sequences of different length is still asymmetric. 

Quasi-metrics were investigated quite early in the development 
of biological sequence comparison algorithms by Waterman, Smith 
and Bayer , but their emphasis at the time was on global rather 
than local similarity measures. Much effort was expanded on met- 
rics jl9U21j which were abandoned in favour of similarity score when 
it was realised that any 'local' distance between two sequences can- 
not satisfy the triangle inequality. 

Most algorithms for similarity search in datasets of biological se- 
quences, even those heuristic like BLAST 0, scan the whole dataset 
to retrieve close neighbours of a query point. Our interest is in at- 
tempting to produce indexing schemes for similarity search |161 117j 
so that a dataset is partitioned so that very few points need to be 
scanned for each search. Performance of indexing schemes depends 
on many factors but it was observed jTl] that the so called 'curse of 
dimensionality', where many indexing schemes for high-dimensional 
spaces perform worse than sequential scan, can be largely explained 
by the concentration of measure phenomenon. The results in 
refer only to metric spaces and the aim of this study is to produce 
foundations for studying similar phenomena in quasi- metric spaces. 

It should be noted that all datasets, biological or otherwise are 
finite and hence topologically discrete and zero-dimensional. How- 
ever, they also carry an additional structure - the normalised count- 
ing measure. Hence, each finite quasi-metric space automatically 
becomes a quasi-metric space with measure. 



3. PQ-SPACES 

The main object of our study is the pq-space, the quasi-metric 
space with Borel probability measure. As two topologies can be 
associated with a quasi-metric, it is appropriate to use the Borel 
structure generated by T{q) U T{q) so that any countable union, 
intersection or difference of any 'left'- or 'right'- open sets is mea- 
surable. It is easy to see that this structure is equivalent to the 
Borel structure generated by T{q), the topology of the associated 
metric since T>e{x) = T>^{x) n Df {x). 
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Definition 3.1. Let {X,q) be a quasi-metric space, and /i a prob- 
ability measure over B, a Borel cr-algebra of measurable sets gener- 
ated by T{q). We call the triple {X,q,fj.) a pq-space. 

The pq-space is the quasi-metric analogue of the metric space 
with Borel measure (mm- or pm- space depending on whether the 
total measure is unity) defined by Gromov and Milman t6j,7,i5j. For 
a metric space with measure, the concentration effects are expressed 
in terms of concentration function. Two such functions, left- and 
right-, can be defined for a pq-space. 

Definition 3.2. Let {X, q, fi) be a pq-space and B the Borel cr- 
algebra of /i-measurable sets. The left concentration function a^-^ ^ ^^ 

also denoted a^, is a map M_|_ [0, ^] such that a(x^^)(0) = ^ 
and 

«fx,q,M)(^) = sup |l - ti{A^); AeB, /x(A) > i| 
for e > 0. 

Similarly, the right concentration function a^j^ ^ , also denoted 
a^, is a map M+ [0, ^] such that ot^x q fi)^^) ~ \ ^'^'^ 

«fx,,,^)(e) = sup|l-M^f); AeB, M^)>^} 
for e > 0. 

For an mm-space {X, d, fi) , and coincide and in that case 
will be denoted a(x,d,^) or just a. It is obvious that if diam(X) is 
finite, then for all e > diam{X), a^{e) = and it can be 

shown that and are decreasing. 

Lemma 3.3. For any pq-space {X,q,fi), for each e >0, 

Proof Let A e B such that fi{A) > i and e > 0. Using A^ C 

A^ n Af, 

1 - n{A^) < 1 - n{As) < a{e) =^ a^{e) < a{e) and 
l-n{Af) <l-fi{Ae) <a{e)^a^{e)<a{e), 
and it follows that max{a^ (e) , (e)} < 
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For the second inequality, use A^^A^n A^, and thus X \ C 
(X\^,^)u(X\Af), implying 

1 - t,{A,) < (1 - t,{A^)) + (1 - f,iA^)) < aHe) + a^(e). 

□ 

The phenomenon of concentration of measure on high- dimensional 
structures refers to the observation that in many high dimensional 
metric spaces with measure, the concentration function decreases 
very sharply, that is, that an e-neighbourhood of any not vanish- 
ingly small set, even for very small e, covers (in terms of the proba- 
bility measure) the whole space. Examples are numerous and come 
from many diverse branches of mathematics \12\ El Q H'^l IZl H''^! I23j . 
In this paper we will take a high dimensional pq-space to be a 
pq-space where both and decrease sharply. 

3.1. Deviation Inequalities. 

Definition 3.4. Let {X, q) be a quasi-metric space. A map / : 
X — > M is called left K -Lipschitz if there exists K E M+ such that 
for all x, y G X 

f{x)-f{y)<Kq{x,y). 
The constant K is called a Lipschitz constant. Similarly, / is right 
K -Lipschitz if f{y) — f{x) < Kq{x, y). Maps that are both left and 
right i^'-Lipschitz are called ET-Lipschitz. 

Left 1-Lipschitz functions were studied by Romaguera and San- 
chis ^Hl under a name of semi-Lipschitz functions and used to ob- 
tain some best approximation results. We use the above terms for 
consistency with the remainder of our terminology. For example, 
it is easy to verify that the functions measuring the left or right 
distances to a fixed point or a set are respectively left or right 1- 
Lipschitz. 

Definition 3.5. Let (X, i3, ^) be a probability space and / a mea- 
surable real- valued function on {X,q). A value m/ is a median or 
Levy mean of / for ^ if 

^({/ < ruf}) > ^ and fi{{f > rrif)} > ^. 

A median need not be unique but it always exists. The following 
lemmas are generalisations of the results for mm-spaces. 
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Lemma 3.6. Let {X, q, n) be a pq-space, with left and right con- 
centration functions and respectively and f a left 1-Lipschitz 
function on {X, q) with a median m f . Then for any e > 

IJ-{{x G X : f{x) < ruf — e}) < a^{e) and 
H{{x e X : f{x) >mf + e}) < a^{£). 

Conversely, if for some non-negative functions ag and aQ : 

M+ R, 

fi{{x G X : f{x) < nif — e}) < (e) and 

fi{{x € X : f{x) > ruf + e}) < a§{e) 

for every left 1-Lipschitz function / : X — > R with median nif and 
every £ > 0, then < and < Uq. 

Proof. Set A = {x G X : f{x) > nif}. Take any y € X such that 
f{y) < ruf — e. Then, for any x E A, q{x, y) > f{x) — f{y) > e and 
hence q{A,y) > e, implying y £ X \ A^. Therefore, iJ,{{x G X : 
f{x) < TO/ - e}) < 1 - ii{A^) < a^{e). 

Now set B = {x e X : f{x) < to/}. Take any y E X such that 
f{y) > TOj + e. Then, for any x € B, q{y, x) > f{y) — f{x) > e and 
hence B) > e, implying y £ X \ B^. Thus, iJ,{{x G X : f{x) > 
TO/ + e}) < 1 - ix{B^) < a^{e). 

The converse is equivalent to finding for each Borel set A C X 
such that n{A) > ^, left 1-Lipschitz functions / and 51 : X — > R with 
medians m/ and rUg respectively, such that 1 — n{A^) < fi{{x G 
X : f{x) < TO/-e}) and I- n{A^) < n{{x G X : g{x) > nig + e}). 

Let ^ C X be such a set such and set for each y £ X, f{y) = 
—q{A,y) and g{y) = q{y,A). It is easy to see that both / and g 
are left 1-Lipschitz and that to/ = rUg = 0. If y £ X \ A^, we have 
^(^4, y) > £ and thus f{y) < —£. Similarly, \iy £ X \ Af, we have 
A) >£ implying g{y) > £ and the result follows. □ 

Hence, we can state the alternative definitions of and a^: 
a^(£) = sup {//({x G X : f{x) < ruf — e}) : / is left 1-Lipschitz} 
and 

a^(£) = sup {/x({a; G X : f{x) > to/-|-£}) : / is right 1-Lipschitz}. 

Similar results can be easily obtained for the right 1-Lipschitz 
functions by remembering that if / is a right 1-Lipschitz, — / is left 
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1-Lipschitz. It is also straightforward to observe that the absolute 
value of deviation of a 1-Lipschitz function from a median thus 
depends on both and a^. 

Corollary 3.7. For any pq-space {X,q,fi), a left 1-Lipschitz func- 
tion f with a median rrif and e > 

This result reduces to the well-known inequality — mj| > 

e}) < 2a{e) when g is a metric. Deviations between the values of a 
left 1-Lipschitz functions at any two points are also bound by both 
concentration functions. 

Lemma 3.8. Let {X,q,iJ,) be a pq-space and / : X — > R a left (or 
right) 1-Lipschitz function. Then 

(/X (8) y)eXxX: f{x) - f{y) > e}) < (|) + (|) . 

Proof. 

{^x ® ii) {{{x, y)eXxX: f{x) - f{y) > e}) 
< (^0^)({(x,y)GXxX:/(x)-m/>|}) 

+ (/x®/x)({(a;,y)eXxX:m/-/(y)>|}) 

= f,[{xeX: fix) > m/ + |}) + /X ({x G X : f{x) < rrif - |}) 

□ 

3.2. Levy families. 

Definition 3.9. A sequence of pq-spaces g„, is called 

left Levy family if the left concentration functions a^-^ ^ ^ ^ con- 
verge to pointwise, that is 

Ve > 0, a/v n „ ife) — > as n ^ oo. 

Similarly, a sequence of pq-spaces |Ltn)}^i is called 

right Levy family if the right concentration functions ^ ^ ^ 
converge to pointwise, that is 

Ve > 0, a/V r, „ 'ife) — >■ as n ^ oo. 



12 



ALEKSANDAR STOJMIROViC 



A sequence which is both left and right Levy family will be called 
a Levy family. Furthermore, if for some constants Ci,C2 > one 
has ani^) < Ci exp(C2e^n), such sequence is called normal Levy 
family. 

It is a straightforward corollary of Lemma 13.31 that a sequence of 
pq-spaces {{Xn, Qn, IJ'n)}^=i is a Levy family if and only if the se- 
quence of associated mm-spaces {{Xn, Qn, Hn)}^=i is a Levy family. 

To illustrate existence of sequences of pq-spaces which are right 
but not left Levy families consider the following example. 

Let X = {a, b} with ^({a}) = | and = |. Set qn{a, b) = 1 

and qn{b, a) = ^ where n € N-|-. 

It is clear that 



Hence, converges to pointwise while does not. In this case 



4. High dimensional pq-spaces are very close to 

mm-spaces 

Most of the above concepts and results are generalisations of 
mm-space results. However, we now develop some results which 
are trivial in the case of mm-spaces. The main result is that, if 
both left and right concentration functions drop off sharply, the 
asymmetry at each pair of point is also very small and the quasi- 
metric is very close to a metric. 

Definition 4.1. For a quasi-metric space {X,q), the asymmetry is 
a map F : X x X ^ R defined by F(x, y) = \q{x, y) — q{y, x)\. 

Obviously, F = on a metric space. However, F is also close to 
for high dimensional spaces, that is, those pq-spaces for which 
both and decrease sharply near zero. 

Theorem 4.2. Let (X,q,fi) be a pq-space. For any e > 0, 
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Proof. Fix a G X and set for each x G X, 7a(x) = q{x, a) — q{a, x). 
It is clear that 7^ is a sum of two left 1-Lipschitz maps and therefore 
left 2-Lipschitz. Furthermore, zero is its median since there is a 
measure-preserving bijection (x, y) 1— > (y, x) which maps the set 
{{x,y) e X X X : q{x,y) > q{y,x)} onto the set {{x,y) e X x X : 
q{x,y) < q{y,x)}. By lemma fJ,{{x G X : |7a(a;)[ > e}) < 
(i) + '^^ (f )• Now, using Fubini's theorem, 

{fi(g> H){{{x,y) £X xX : \q(x,y) - q{y,x)\ > e}) 



hh.{y)\>e}My)Mx) 

x6X JyeX 



2/ V2 

R 



□ 



Thus, any pq-space where both and (and hence , by the 
Lemma 1231 ex) sharply decrease are, apart from a set of very small 
size, very close to an mm-space. 

5. Examples 

5.1. Hamming Cube. 

Definition 5.1. Let n G N and S = {0, 1}. The collection of ah 
binary strings of length n is denoted and called the Hamming 
cube. 

Definition 5.2. The Hamming distance (metric) for any two strings 
a = (Ti(T2 . . . cr„ and r = T1T2 . . . r„ G S" is given by 

dn{a,T) = |{i G N : CT, / rJI . 

The normalised Hamming distance pn is given by 

d{(T,T) |{i G N : Till 
Pn{cr,T) = = . 

n n 
Definition 5.3. The normalised counting measure fj-n: of any sub- 
set j4 of a Hamming cube is given by 
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It is easy to see that the above definitions indeed give a set with 
a metric and a measure and that (S", is an mm-space. One 

may wish to consider Yi^ as a product space with as an ^i-type 
sum of discrete metrics on {0, 1} and /i„ an n-product of /ii, where 



The following bounds for the concentration function have been 
estabhshed f^ : 

Proposition 5.4. For the Hamming cube Ti"' with the normalised 
Hamming distance pn and the normalised counting measure we 
have 



"(E",p„,M„)('^) < exp(-2e2n). 
Hence the sequence {(S", ;U„)}^]^ is a normal Levy family. 



5.1.1. Law of Large Numbers. An easy consequence of the Propo- 
sition 15.41 is the well-known Law of large numbers. 

Proposition 5.5. Let {e)i<j\f be an independent sequence of Bernoulli 
random variables (P{ei = 1) = P{ei <= —1) = ^). Then for all 



Equivalently, if B^ is the number of ones in the sequence {e)i<N 
then 



5.1.2. Asymmetric Hamming Cube. We will now produce a pq- 
space based on the Hamming cube by replacing pn by a quasi- 
metric. The simplest way is to define gi : S ^ M by qi{0, 1) = 1 and 



qi{l,0) = gi(0,0) = gi(l,l) = Oandset g„(o-,r) = ^ I]"=i '^O- 



The triple (S", forms a pq-space. One immediately observes 

that {(S", q-n, Pn)}^i forms a normal Levy family since the associ- 
ated metric qn is the Hamming metric 

Take two strings a and r and let us consider the asymmetry 
Tn{cr,T). It is easy to see that r„ takes value between and 1, 
being equal to the quantity 



W({0})=/Xi({l}) = i. 



t > 





- \{i:ai = OATi = l}\-\{i:ai = lATi 



0} 



n 



QUASI-METRIC SPACES WITH MEASURE 



15 



Since our asymmetric Hamming cube is a product space, we can 
consider for each i < n the value Si = q{ai,Ti) — q{Ti,ai) as a 
random variable taking values of 0, —1 and 1 with P{6i = 0) = ^ 
and P{5i = -1) = P{5i = 1) = i so that r„(^7,r) = i ^,<„ \5i\. 
Now, 

(/Xn^/^n)({(cT,r)GS"xS":r„(a,T)>e}) = ^'(5^^|5i|>£ 

< P{Y.\ei\>ne 



-2 

< 2 exp 



ne 



This is obviously the same bound as would be obtained by ap- 
plication of Theorem 14.21 and Proposition 15.41 



5.2. Penalties. Talagrand (22l obtained the exponential bounds 
for product spaces endowed with a non-negative 'penalty' func- 
tion generalising the distance between two points. Penalties form a 
much wider class of distances than quasi-metrics but provide ready 
bounds for the left- and right- concentration functions. 

We will outline here just one of results from [21] and apply it to 
obtain bounds for concentration functions in product quasi-metric 
spaces with product measure. 

Consider a probability space S, /x) and the product (fi^, /x^) 
where the product probability fi^ will be denoted by P. Consider 
a function / : 2^ x which will measure the distance 

between a set and a point in More specifically, given a function 
h : fl X Q ^ M_l_ such that h{ijj, uj) = for all w G O set 

f{A, x) = inf < ^ h{xi,yi); y e A 

Theorem 5.6 ([22l). Assume that 

IMoc = sup h{x,y) 
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is finite and set 

1/2 



/ / h^{uj,Lo')dfj,{Lo)dfj,{Lo') 
J Jq2 



Then 



\ I I Ip' 11 

P{{f{A, •) > u}) < — exp - min 



If we replace h above by q^i, a quasi-metric on Q, and endow 
with the ^i-type quasi-metric q so that x,y E fl^ , q{x,y) = 
^i<N Qnixi,yi), we have f{A, x) = q{x, A) and the fohowing corol- 
lary is obtained. 



Corollary 5.7. Suppose \\qn\\fx, < oo- Then 



(f2Jv,g,/,iV)(e) < 2exp -min 



S^lknlls' 2|kniioo, 



Note that the bound applies to a and hence to both and 
because the norms referred to above are symmetric. 
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